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ABSTRACT 

The use of Guttman weights in scoring tests is 
discussed. Scores of 2,500 men cn one subtest of the CEED-SAT-Verbal 
Test were examined using cross- validated Guttman weights. Several 
scores were compared, as follows: Scores obtained from 
cross-validated Guttman weights; Scores obtained by rounding the 
Guttman weights to one digit, ranging trom 0 to 8 ; Scores obtained by 
approximating the Guttman weights by weights of 0, 1, and 2; Scores 
obtained by weighting a right answer 2, and a wrong answer 1; Scores 
obtained by weighting a right answer 1, and a wrong answer — 1/4; 
Number right; Number wrong; and Number omitted. The reliabilities and 
intercorrelations of these scores are shown in tabular form. It is 
concluded that most of the advantages of Guttman weights come from 
weighting ’’omits” less than ’’wrong” answers. Because instructions in 
common use encourage people to omit items on a test, Guttman 
weighting is not recommended since with these weights the best 
strategy is to guess, (DB) 
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THE SMSITIVITY OF GUTTMAH WEIGHTS 



Bert F. Green, Jr. 

The Johns Hopkins University 

The method of reciprocal averages (Mitzel & Hoyt, 195^) and 
Guttman’s scaling metric (Guttman, 19^1) have been around for a 
long time, yet they are not much used in practice. It is much 
quicker for a human scorer to count the number right than to add 
up a lot of weights. Today’s con^uters and test scoring machinery 
have removed this oniis; it costs the computer less than a millise- 
cond to use weights rather than counts. The renewed interest in 
differential weighting, e.g, Stanley & Wang (1970) is thus a trend 
of our mechanized times. 

The motivation for differential weighting of item options is 
clear. A score of 1 or 0 on an aptitude test item is small pay-off 
for the student's extreme mental anguish in answering the question. 
Could we not milk some extra information from his answer? If he 
gets the item right, that is that, but if he chooses one of the 
distractors, might he not deserve partial credit for choosing a 
'’bettej. ‘ distractor? Skeptics argue there is very little to be 
gained, especially with a reasonably homogeneous test, and that 
there is more pay-off in better items than in better ways of scoring 
present items. 

It is well knoim that differential weighting is at best a 
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second order improveiaen't » The Likerti method of attitude scaling 
is a case in point (See Green, 195^). Respondents rate each state- 
ment ’’strongly agree,” "agree” "indifferent” "disagree” "strongly 
disagree," and get scored 5> 3, 2, or 1 for their effort. 

Originally Likert proposed choosing item wei^ts for maximum dis- 
crimination among the respondents, but found that the simpler 
5-}|-3-2-l method yielded scores correlating in the high 90 *s with 
the best weighted scores. The added precision of the scoring meth- 
od was simply lost in the large error variance of individual item 
responses , 

Green (1969) recently discussed the conditions under which 
differential regression weights are reasonable, arguing that simple 
one-digit weights are virtually indistinguishable from 3-digit beta 
weights. Applications of earlier results due to Wilks & Sheffe per- 
mit stating a multidimensional interval of indistinguishability of 
weighting schemes. 

The present context is very similar to the regression situation, 
in that the comparison is between equally weighted distractors (all 
0) and differentially weighted distractors. But zero wei^ts are 
a very special case, and amount to adding or dropping information 
rather than singly weighting it differently. The usual theorems do 
not apply to zero wei^ts. 

It is clear empirically that reliability can be significantly 
in 5 >roved by Guttman weighting; the remaining question concerns how 
seriously to take the weights. How much is lost by using one-digit 
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weights, or a 5-U- 3-2-1 scheme? Even more extreme, what about giv- 
ing 0 for omit, 1 for any wrong distractor, and 2 for the right 
answer. (Not, of course, telling the students what we plan to do). 

Theoretical analysis indicates that, as in the case of re- 
gression weights, we will lose almost nothing by using one-digit 
weights — 0-9, and not much more by \ising 5-U-3-2-1-0. But analysis 
cannot preceed further without the specific data at hand, so we must 
turn to the empirical results. Here it would be handy to have an 
interactive computer system so that we coiiLd tiy different weight- 
ing schemes on the basis of results of earlier weights, but we 
have made do with batch processing. 

Empirical Results 

One subtest of the CEEB-SAT-Verbal Test was examined in de- 
tail. The sample is described more fully by Hendricksen (1971) 
whose help in obtaining the data is gratefully acknowledged. Our 
sample consisted of 2,500 men. Cross -vail dated Guttman weights 
were used. We compared several scores, on this group, as follows, 

1. Scores obtained from cross-validated Guttman weights 

2. Scores obtained by rounding the Guttman weights to one 
digit, ranging from 0 to 8 

3. Scores obtained by approximating the Guttman weights by 
wei^ts of 0, 1, and 2 

U. Scores obtained by wei^ting a ri^t answer 2, and a 
wrong answer 1 

5. Scores obtained by wei£^ting a right answer 1, and a 
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vrong answer — 1/^ 

6. Number right 

7. Number wrong 

8. Number omitted 

The, reliabilities and Intercorrelations of these scores are 
shown in Table 1. 

It is clear that the first four scores are virtually identical, 
correlating at least .98 with each other. Moreover these four 
scores have nearly the same reliability. The siii^>le weights loose 
a slight amount (.01) but this is not much, compared with the gain 
of all these four scores over the formula score (R - 1/UW ) or the 
simple number right. 

It is worth noting thKv he three versions of the Guttman 
weights, unrounded, slightly rounded and very rounded, all cor- 
relate at least .99 whereas the score 2^R*f'W correlates only .98 
with each. So there is definitely some information in the Guttman 
scores that is not in the 2*R+W scores, but not very much. We 
must conclude, then, that most of the advantages of Guttman weights 
comes from wei^ting omits less than wrong answers . 
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Table la Reliabilities of several scores obtained from the 



CEEB-SAT-Verbal Test. 



Standard 

Score Reliability Mean Deviation 



1. Guttman weights 


. 89 )* 


12.82 




2. One-digit Guttman Weights (0-8) 


.890 


220.37 


22.12 


3. Simple Guttman Weights (0-1-2) 


.880 


58 . U 3 


10.13 


k. 2*R+W 


.879 


57.37 


9.60 


5. R - 1/4W 


.850 


17.i^5 


i 7.87 


6. R 


.855 


21.20 


6.67 


7. W 


CD 

0 

KXl 


IU .98 


5.97 


8. 0 


.865 


3.82 ' 


U.UO 



Table lb Intercorrelation of Scores 
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Discussion 

The important fact, obtained from an initial scan of the 
Guttman vei^ts , and borne out by subsequent analysis , is that 
there is something very special about omits. With the usual in- 
struction about not wasting time by guessing, it turns out that 
people who omit items do more poorly on the teat. The omit option 
is to be weighted less than the distractors . Most of the gain in 
the Guttman wei^ts is in thr differential treatment of the omits. 
People who omit items do badly on the test, and by wei^ting the 
omit category low, the test reliability is increased. 

We need to know more about the properties of the ’’omit" score. 
Reilly and Jackson (1972) have presented indirect evidence that 
suggests that the "omit'‘ score is reliable but Invalid for graduate 
school grades. But the validity of predictors of graduate perfor- 
mance is generally low. It would be helpful to repeat the study 
using the CEEB-SAT, with undergraduate grades as the criterion. If 
the omit score is reliable, what does it measure? Incidentally, in 
our data it is not the case that the people who omit are too slow. 

At least it is not true that the omits are all bunched at the end. 
People omit items because they don't know the answer, more often 
than they omit items because they don't reach them. 

At a practical level, it appears that we cannot ethically use 
Guttman weights, because the instxnictions in common use encourage 
omitting items, whereas when Guttman weights are used, the best 
strategy is to guess. It is an open question whether Guttman weights 
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will be useful when applied to data obtained when everyone 
guesses if he doesn't know or doesn't have time. 

Perhaps we can develop a separate test on which those who 
omit many items will score well; if so, that test could be used 
as a suppressor variable. 

Presumably the problem would go away with a tailored test 
given by a computer. An answer would be required for every item, 
and both speed and correctness would be relevant factoz*s. Most 
studies of tailored testing (e.g. Lord (1970), Green (1970)), have 
treated only items scored ri^t and wrong. Work is needed on 
differential wei^ting of options, and on the use of second at- 
tempts, in this computer-based situation. 

Until we can solve these problems, it seems clear that Gutt- 
man wei^tlng is not to be recommended. 
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